Skip to main content

Documentation Index

Fetch the complete documentation index at: https://mintlify.com/mlfoundations/open_clip/llms.txt

Use this file to discover all available pages before exploring further.

OpenCLIP provides flexible model loading with support for pretrained weights, custom configurations, and multiple storage backends.

Basic Model Loading

create_model()

The core function for creating CLIP models with flexible configuration options.
import open_clip

model = open_clip.create_model(
    'ViT-B-32',
    pretrained='laion2b_s34b_b79k',
    device='cuda',
    precision='fp16'
)
model_name
str
required
Model architecture name (e.g., ‘ViT-B-32’, ‘RN50’) or schema-prefixed path:
  • Built-in: 'ViT-B-32'
  • HuggingFace Hub: 'hf-hub:org/repo'
  • Local directory: 'local-dir:/path/to/model'
pretrained
str
Pretrained weights source. Can be:
  • Tag name (e.g., ‘openai’, ‘laion2b_s34b_b79k’)
  • Local file path (e.g., ‘/path/to/weights.pt’)
  • Ignored if model_name uses schema prefix
device
str | torch.device
default:"cpu"
Device to load model on (‘cpu’, ‘cuda’, etc.)
precision
str
default:"fp32"
Model precision: ‘fp32’, ‘fp16’, ‘bf16’, ‘pure_fp16’, ‘pure_bf16’
jit
bool
default:"False"
Whether to JIT compile the model
force_image_size
int | Tuple[int, int]
Override default image size for the model
cache_dir
str
Directory for caching downloaded weights

Loading Schemas

HuggingFace Hub

Load models directly from HuggingFace Hub using the hf-hub: schema:
model = open_clip.create_model(
    'hf-hub:laion/CLIP-ViT-L-14-DataComp.XL-s13B-b90K',
    device='cuda'
)
The function automatically:
  • Downloads open_clip_config.json from the repo
  • Looks for weights files (.safetensors, .bin, .pth)
  • Merges preprocessing configuration

Local Directory

Load from a local directory containing model config and weights:
model = open_clip.create_model(
    'local-dir:/path/to/my/model',
    device='cuda'
)
Local directory must contain:
  • open_clip_config.json with model configuration
  • Weight file (searched in order): open_clip_model.safetensors, pytorch_model.bin, model.pth, etc.

Local File Path

Load weights from a specific file:
model = open_clip.create_model(
    'ViT-B-32',
    pretrained='/path/to/checkpoint.pt',
    device='cuda'
)

Advanced Loading Options

Tower-Specific Weights

Load separate weights for image and text towers:
model = open_clip.create_model(
    'ViT-B-32',
    pretrained_image=True,  # Load default ImageNet weights
    pretrained_text=True,   # Load default LM weights
    pretrained_image_path='/path/to/vision.pt',  # Override with custom weights
    pretrained_text_path='/path/to/text.pt'
)
pretrained_image
bool
default:"False"
Load default pretrained weights for image tower (timm models)
pretrained_text
bool
default:"True"
Load default pretrained weights for text tower (HuggingFace models)
pretrained_image_path
str
Path to custom image tower weights (loaded after full model)
pretrained_text_path
str
Path to custom text tower weights (loaded after full model)

Custom Model Configuration

Override model architecture parameters:
model = open_clip.create_model(
    'ViT-B-32',
    pretrained='laion2b_s34b_b79k',
    force_quick_gelu=True,
    force_patch_dropout=0.5,
    force_image_size=336,
    force_context_length=128
)

create_model_and_transforms()

Convenience function that returns model with preprocessing transforms:
model, preprocess_train, preprocess_val = open_clip.create_model_and_transforms(
    'ViT-B-32',
    pretrained='laion2b_s34b_b79k',
    device='cuda',
    precision='fp16'
)

# Use transforms
from PIL import Image
image = Image.open('example.jpg')
image_tensor = preprocess_val(image)
Returns a tuple of (model, train_transform, val_transform). The transforms handle:
  • Image resizing and cropping
  • Normalization with correct mean/std
  • Data augmentation (training only)
Always use model.eval() before inference. Models are in training mode by default, which affects layers like BatchNorm.

create_model_from_pretrained()

Strictly requires pretrained weights (raises error if weights can’t be loaded):
model, preprocess = open_clip.create_model_from_pretrained(
    'ViT-B-32',
    pretrained='laion2b_s34b_b79k',
    device='cuda',
    return_transform=True
)
return_transform
bool
default:"True"
Whether to return preprocessing transform. If False, returns only model.
This is the recommended function for inference use cases where pretrained weights are essential.

Listing Available Models

import open_clip

# List all model architectures
architectures = open_clip.list_models()
print(architectures)  # ['RN50', 'RN101', 'ViT-B-32', 'ViT-L-14', ...]

# List all pretrained weights
pretrained = open_clip.list_pretrained()
for model_name, tag in pretrained:
    print(f"{model_name}:{tag}")

# List pretrained weights as strings
pretrained_str = open_clip.list_pretrained(as_str=True)
# ['RN50:openai', 'RN50:yfcc15m', 'ViT-B-32:laion2b_s34b_b79k', ...]

Weight Loading Options

load_weights
bool
default:"True"
Whether to load the resolved pretrained weights. Set to False for random initialization.
require_pretrained
bool
default:"False"
Raise error if pretrained weights cannot be loaded
weights_only
bool
default:"True"
Use weights_only=True for torch.load (safer, prevents arbitrary code execution)

Complete Example

import torch
import open_clip
from PIL import Image

# Load model with transforms
model, _, preprocess = open_clip.create_model_and_transforms(
    'ViT-L-14',
    pretrained='datacomp_xl_s13b_b90k',
    device='cuda',
    precision='fp16',
    force_image_size=224
)
model.eval()

# Get tokenizer
tokenizer = open_clip.get_tokenizer('ViT-L-14')

# Prepare inputs
image = preprocess(Image.open('cat.jpg')).unsqueeze(0).cuda()
text = tokenizer(["a cat", "a dog"]).cuda()

# Inference
with torch.no_grad(), torch.cuda.amp.autocast():
    image_features = model.encode_image(image)
    text_features = model.encode_text(text)
    
    # Normalize features
    image_features /= image_features.norm(dim=-1, keepdim=True)
    text_features /= text_features.norm(dim=-1, keepdim=True)
    
    # Compute similarity
    similarity = (100.0 * image_features @ text_features.T).softmax(dim=-1)
    print("Similarity:", similarity)